Bayesian Statistics

1. Principle of Maximum Entropy
- 1.1. Principle of Indifference
2. Prior Probability
- 2.1. Strength
3. Bayes' Theorem
- 3.1. Statement
4. Conjugate Distribution
5. Bayes Estimator
6. Maximum A Posteriori Probability Estimator
- 6.1. Description
7. References

Statistics under the Bayesian probability interpretation.

1. Principle of Maximum Entropy

The probability distribution which best represents the current state of knowledge about a system is the one with largest entropy.

1.1. Principle of Indifference

In the absence of any evidence, the credence—the degree of belief—should be equally distributed among all possible outcomes.

2. Prior Probability

Probability distribution before taking the evidences into account.

2.1. Strength

The certainty upon the system. Strong prior would change little.

3. Bayes' Theorem

Bayes' Law, Bayes' Rule

3.1. Statement

\[ \operatorname{P}[A|B] = \frac{\operatorname{P}[B|A]\operatorname{P}[A]}{\operatorname{P}[B]} \] where \(\operatorname{P}\) is the probability of the events \(A\) and \(B\).
According to the Bayesian probability interpretation:
- \(\operatorname{P}[A|B]\) is the posterior probability of \(A\) given \(B\).
- \(\operatorname{P}[B|A]\) is the likelihood of \(A\) given a fixed \(B\), since \(\operatorname{P}[B|A] = \operatorname{L}[A|B]\).
- \(\operatorname{P}[A]\) is the prior probability.
- \(\operatorname{P}[B]\) is the marginal probability.

4. Conjugate Distribution

If prior distribution and the posterior distribution is in the same probability distribution family, then the prior and posterior are called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.

5. Bayes Estimator

Bayes Action

Estimator or decision rule that minimizes the posterior expected value of a loss function.

6. Maximum A Posteriori Probability Estimator

MAP Estimator

6.1. Description

The maximum likelihood estimate of \(\theta\): \[ \hat{\theta}_{\rm MLE}(x) = \operatorname*{arg\ max}_{\theta} f(x\mid\theta) \] can be generalized to include the prior distribution \(g(\theta)\) using Bayes' theorem:

\begin{align*} \hat{\theta}_{\rm MAP}(x) &= \operatorname*{arg\ max}_{\theta}\frac{f(x\mid \theta)g(\theta)}{\int_\Theta f(x\mid\vartheta)g(\vartheta)d\vartheta} \\[10pt] &= \operatorname*{arg\ max}_{\theta} f(x\mid \theta)g(\theta). \end{align*}